AITopics | student-t distribution

Exponential family distributions are highly useful in machine learning since their calculation can be performed efficiently through natural parameters. The exponential family has recently been extended to the t-exponential family, which contains Student-t distributions as family members and thus allows us to handle noisy data well. However, since the t-exponential family is defined by the deformed exponential, an efficient learning algorithm for the t-exponential family such as expectation propagation (EP) cannot be derived in the same way as the ordinary exponential family. In this paper, we borrow the mathematical tools of q-algebra from statistical physics and show that the pseudo additivity of distributions allows us to perform calculation of t-exponential family distributions through natural parameters. We then develop an expectation propagation (EP) algorithm for the t-exponential family, which provides a deterministic approximation to the posterior or predictive distribution with simple moment matching. We finally apply the proposed EP algorithm to the Bayes point machine and Student-t process classification, and demonstrate their performance numerically.

artificial intelligence, machine learning, t-exponential family, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States > Massachusetts (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Optimal Formats for Weight Quantisation

Orr, Douglas, Ribar, Luka, Luschi, Carlo

arXiv.org Artificial IntelligenceSep-26-2025

Weight quantisation is an essential technique for enabling efficient training and deployment of modern deep learning models. However, the recipe book of quantisation formats is large and formats are often chosen empirically. In this paper, we propose a framework for systematic design and analysis of quantisation formats. By connecting the question of format design with the classical quantisation theory, we show that the strong practical performance of popular formats comes from their ability to represent values using variable-length codes. We frame the problem as minimising the KL divergence between original and quantised model outputs under a model size constraint, which can be approximated by minimising the squared quantisation error, a well-studied problem where entropy-constrained quantisers with variable-length codes are optimal. We develop non-linear quantisation curves for block-scaled data across multiple distribution families and observe that these formats, along with sparse outlier formats, consistently outperform fixed-length formats, indicating that they also exploit variable-length encoding. Finally, by using the relationship between the Fisher information and KL divergence, we derive the optimal allocation of bit-widths to individual parameter tensors across the model's layers, saving up to 0.25 bits per parameter when applied to large language models. Weight quantisation enables large deep learning models to run on low-resource hardware and edge devices by saving space and memory bandwidth usage. It can be seen as an optimisation problem, where the goal is to retain the behaviour of the high-precision reference model while reducing the total number of bits needed to store its parameters. This naturally splits into two sub-problems of format design and quantisation procedure, both of which are highly active areas of research. We focus on the format design question, i.e., how to choose a representation space for model parameters. This is somewhat independent from the quantisation procedure, which aims to find an optimal point in that space.

kl divergence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.12988

Country:

North America > United States (1.00)
Asia (0.67)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Graphical Models in Heavy-Tailed Markets

Neural Information Processing SystemsAug-16-2025, 14:42:46 GMT

V ectors are assumed to be column vectors.

artificial intelligence, graph, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland (0.04)
(8 more...)

Industry:

Banking & Finance > Trading (1.00)
Health & Medicine (0.70)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.93)
Information Technology > e-Commerce > Financial Technology (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

5adff4d5402703418f7210a4004e1314-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 01:56:40 GMT

algorithm, bipartite graph, graph, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Banking & Finance > Trading (0.69)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.69)

Add feedback

Design of Restricted Normalizing Flow towards Arbitrary Stochastic Policy with Computational Efficiency

Kobayashi, Taisuke, Aotani, Takumi

arXiv.org Artificial IntelligenceDec-17-2024

This paper proposes a new design method for a stochastic control policy using a normalizing flow (NF). In reinforcement learning (RL), the policy is usually modeled as a distribution model with trainable parameters. When this parameterization has less expressiveness, it would fail to acquiring the optimal policy. A mixture model has capability of a universal approximation, but it with too much redundancy increases the computational cost, which can become a bottleneck when considering the use of real-time robot control. As another approach, NF, which is with additional parameters for invertible transformation from a simple stochastic model as a base, is expected to exert high expressiveness and lower computational cost. However, NF cannot compute its mean analytically due to complexity of the invertible transformation, and it lacks reliability because it retains stochastic behaviors after deployment for robot controller. This paper therefore designs a restricted NF (RNF) that achieves an analytic mean by appropriately restricting the invertible transformation. In addition, the expressiveness impaired by this restriction is regained using bimodal student-t distribution as its base, so-called Bit-RNF. In RL benchmarks, Bit-RNF policy outperformed the previous models. Finally, a real robot experiment demonstrated the applicability of Bit-RNF policy to real world. The attached video is uploaded on youtube: https://youtu.be/R_GJVZDW9bk

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/01691864.2023.2208634

2412.12894

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Heavy-Tailed Diffusion Models

Pandey, Kushagra, Pathak, Jaideep, Xu, Yilun, Mandt, Stephan, Pritchard, Michael, Vahdat, Arash, Mardani, Morteza

arXiv.org Machine LearningOct-29-2024

Diffusion models achieve state-of-the-art generation quality across many applications, but their ability to capture rare or extreme events in heavy-tailed distributions remains unclear. In this work, we show that traditional diffusion and flow-matching models with standard Gaussian priors fail to capture heavy-tailed behavior. We address this by repurposing the diffusion framework for heavy-tail estimation using multivariate Student-t distributions. We develop a tailored perturbation kernel and derive the denoising posterior based on the conditional Student-t distribution for the backward process. Inspired by $\gamma$-divergence for heavy-tailed distributions, we derive a training objective for heavy-tailed denoisers. The resulting framework introduces controllable tail generation using only a single scalar hyperparameter, making it easily tunable for diverse real-world distributions. As specific instantiations of our framework, we introduce t-EDM and t-Flow, extensions of existing diffusion and flow models that employ a Student-t prior. Remarkably, our approach is readily compatible with standard Gaussian diffusion models and requires only minimal code changes. Empirically, we show that our t-EDM and t-Flow outperform standard diffusion models in heavy-tail estimation on high-resolution weather datasets in which generating rare and extreme events is crucial.

diffusion model, objective, preprint, (16 more...)

arXiv.org Machine Learning

2410.14171

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Orange County > Irvine (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback

Expectation Propagation for t-Exponential Family Using q-Algebra

Futoshi Futami, Issei Sato, Masashi Sugiyama

Neural Information Processing SystemsOct-2-2024, 20:32:31 GMT

Exponential family distributions are highly useful in machine learning since their calculation can be performed efficiently through natural parameters. The exponential family has recently been extended to the t-exponential family, which contains Student-t distributions as family members and thus allows us to handle noisy data well. However, since the t-exponential family is defined by the deformed exponential, an efficient learning algorithm for the t-exponential family such as expectation propagation (EP) cannot be derived in the same way as the ordinary exponential family. In this paper, we borrow the mathematical tools of q-algebra from statistical physics and show that the pseudo additivity of distributions allows us to perform calculation of t-exponential family distributions through natural parameters. We then develop an expectation propagation (EP) algorithm for the t-exponential family, which provides a deterministic approximation to the posterior or predictive distribution with simple moment matching. We finally apply the proposed EP algorithm to the Bayes point machine and Student-t process classification, and demonstrate their performance numerically.

classification, exponential family, t-exponential family, (16 more...)

Neural Information Processing Systems

Country: